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Diff. Patch. Change. Delta. A. 

Whatever you call it, diffs are pretty much the aggregate output of any developer’s day. A 
bunch of deletions and additions from a set of files that can be codified in a .patch file as a 
bunch of pluses, minuses and context indicators. 

a/todo.md 

+++ b/todo.md 
@@ - 1,5 + 1,4 @@ 

# TODO 

buy anniversary present 

build arduino-based vaucanson's duck 

-blog on git diffs 

fork webkit 


I’m still undecided as to whether it’s depressing or amazing that our life’s work can be so simply 
encoded. 

Regardless, git has given us a bunch of neat ways at looking at these diffs. The following post is 
a quick primer on the various incantations of git diff. Note that I’m going to be focusing on 
different output format for viewing diffs. There’s also a whole range of ways you can specify 
different commits and paths to diff, but that’s a topic for another day. 

git diff 

You’re probably already familiar with the built in git diff command. By default, diff will show 
you changes to tracked files in your work tree that haven’t yet been added to the index. For 
example: 

$ cat 2cities.txt 

It was the best of times, 
it was the blurst of times. 

$ sed -i '.tmp' ' s/blurst/worst/ ' 2cities.txt 

$ git diff 

It was the best of times, 

-it was the blurst of times. 

+it was the worst of times. 
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This works fine, but it’s unfortunate that it shows up as one removed line followed by one added 
line. All I did was change one word! 

git diff --color-words 

git diff also has a special mode for highlighting changes with much better granularity: — color- 
words. This mode tokenizes added and removed lines by whitespace and then diffs those. 

$ git diff — color-words 

It was the best of times, 
it was the blurstworst of times. 
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That’s an improvement! Now we’re down to the individual words that have changed. But if you 
look closely, we’ve really only changed three characters: “blu” has become “wo”. We can do 
even better. 

git diff-highlight 

If you clone the git source you’ll find a sub-directory called contrib. It contains a bunch of git- 
related tools and other interesting bits and pieces that haven’t yet been promoted to git core. 
One of these is a perl script called diff-highlight. diff-highlight pairs up matching lines of diff 
output and highlights sub-word fragments that have changed. 

$ git diff | ~/src/git/contrib/dif f-hightlight/dif f-highlight 

It was the best of times, 

-it was the blurst of times . 

+it was the worst of times. 


Sweet! Now we’ve pared down our diff to the smallest possible change. 

If you like this fine-grain way of looking at diffs, you can add a shell function alias to 
-/ . git/config to handle the piping to diff-highlight for you: 

1 [alias] 

2# assumes git. git clone at ~/src/git 

3diff-highlight = "!f() { git diff \"$@\" | ~/src/git/cont rib/diff-highlight/diff-high 
Then, you can get a nicely highlighted diff by simply invoking git diff-highlight. 

Diffing binary files 

If you make changes to a binary file (such as an image, pdf or archive) the default output from 
git diff is pretty uninteresting. 

$ git diff 

Binary files a/script. pdf and b/script.pdf differ 


However, git does have a nifty feature that allows you to specify a shell command to transform 
the content of your binary files into text prior to performing the diff. It does require a little set up 
though. 

First you need to specify a texteonv filter describing how to convert a certain type of binary to 
text. I’m using a simple utility called pdftohtml (available via homebrew) to convert my PDFs into 
human readable HTML. You can set this up for a single repository by editing your .git/config file, 
or globally by editing -/ . gitconf ig. 

.git/config 

l[diff "pdfeonv"] 
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2textconv=pdftohtml -stdout 

Then all you need to do is associate one or more file patterns with our pdfconv filter. You can 
do this by creating a . gitattributes file in the root of your repository. 

.gitattributes 


l*.pdf diff=pdfconv 

Then, the next time we attempt to do a diff of a binary file, we get much more meaningful 
output: 

$ git diff 

<HTML> 

<BODY> 

Interior. Room filled with monkeys bashing on typewriters. Mr. Burns 
tears a sheet of paper from a typewriter .<br> 

-Burns: It was the best of times., it was the blurst of times ?<br> 

+Burns: It was the best of times., it was the blurst of times? You 

stupid monkey !<br> 

Burns balls up the sheet and throws it at the monkey. <br> 

< / BOD Y> 

</HTML> 

And remember, this is just regular diff output, so we can redo our dif f -highlight trick to get 
something a bit more palatable: 

$ git dif f-highlight 

Interior. Room filled with monkeys bashing on typewriters. Mr. Burns 
tears a sheet of paper from a typewriter .<br> 

-Burns: It was the best of times., it was the blurst of times ?<br> 

+Burns: It was the best of times., it was the blurst of times? You 

stupid monkey !<br> 

Burns balls up the sheet and throws it at the monkey. <br> 

The same technique can be applied to get useful diffs from all sorts of binary files, for example: 

• zips, jars and other archives: using unzip -I (or similar) in place of pdf 2html will show you 
paths that have been added or removed between commits 

• images: exiv2 can be used to show metadata changes such as image dimensions 

• documents: conversion tools exist for transforming .odf, .doc and other document formats 
to plain text. In a pinch, strings will often work for binary files where no formal converter 
exists. 

Want to talk more about git or Atlassian Stash? Leave a comment or hit me up on twitter 
@kannonboy 
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Comments (3) 



“By default, diff will show you the difference between unstaged changes to currently 
tracked files, and your current HEAD.” 
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Nope! 

By default, diff compares the files in your working area with the files in the staging area 
(aka “the index”). 

To compare the index with HEAD you can say “git diff -cached”. 

By Gustavo Chaves on June 27, 2013 / Reply 

Good catch Gustavo - I’ve updated the post. Thanks! 

By Tim Pettersen on June 27, 2013 




... and to show what has been staged: “git diff -staged” 
By Daniel S. reichenbach on June 28, 2013 / Reply 
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